README

Replication data and code for "Political Alignment between Firms and Employees in the United States: Evidence from a new Dataset"

Political Science Research & Methods
Date: 04 March 2020
*******************************************************************************************************




Files for replication:

1) Data Files

01. R_shares.RDta: file containing aggregated donation shares of companies and employees, by types of office and electoral race. Used to produce Figure 2, Figure A1, A2, and A5.
02. main_results_ind.RDta: file containing individual-level donations matched to PACs, producing Table 5 and Table 6.
03. ind_dv.RDta: file containing individual-level data producing Figure 3
04. ind_contr_soc_naics.RDta: file containing all individidual-level data without matching PACs, producing Table Table A4 and A6
05. main_results_firm.RDta: file containing donation shares of companies and employees, aggregated at the company level, producing Table 4.
06. align_firmsize.RDta: file containing alignment scores at the company level as well as firm size, producing Figure 4a.
07. align_ceo.RDta: file containing alignment scores at the firm-occupation level, producing Figure 4b.

08. app_results_ind_match_60.RDta: appendix data, producing Table A8 (reproducing Table 6 with lower threshold for matching occupations).
09. app_results_firm.RDta: appendix data, producing Table A9 (reproducing Table 4 excluding 0 alignment scores).
10. app_results_ind.RDta: appendix data, producing Table A10 and A11 (reproducing Table 5 and Table 6 excluding 0 alignment scores).

11. firmnames.csv: file containing company names linked to Compustat GVKEY. Used for Figure A3a.
12. soc_2010_titles.csv: file containing occupation names in SOC 2010 format. Used for Figure A3b and Table A5, and A6
13. naics_2012_titles.csv: file containing industry names in NAICS 2012 format. Used for Table A2, A3, and A4
14. naics2totemp.csv: file containging 2016 US employment for 3-digit NAICS industries from Bureau of Labor Statistics. Used for Table A4 
15. soc2totemp.csv: file containging 2016 US employment for 2-digit SOC industries from Bureau of Labor Statistics. Used for Table A6 
16. cv_replic_tab.csv: file containing results from cross-validation exercise, producing Table A7



2) Code Files

1. 01_main.R: Reproduction code for all regression tables and figures included in the main part of the article.
2. 02_appendix.R: Reproduction code for all regression tables and figures included in the online appendix.





NOTE:

This file contains code and data to reproduce results exactly as in the submitted manuscript. Note that the additional robustness checks (Table A8, A9, A10, A11)
in the appendix use slightly altered, updated dataset compared to the dataset used for the main results. That is, the data used in the appendix also 
exclude donations from individual donors from outside of the 50 US states (e.g. American Samoa, Puerto Rico, but also from some Canadian provinces). These 
donations had been excluded for further analysis analysis between the first submission of the paper and the production of the robustness checks. 

Excluding these observations has a small impact on the number of observations (e.g. Table A11, Column 1 uses 119,578 observations, but would have used 
122,956 using non-50 US states),  miniscule or no effects on coefficients reported (e.g. Table A11, Column 1 reports a coefficient of 0.067 on the CEO dummy;
including non-50 US states results in a coefficient of 0.066), and no impact on significance levels. Conversely, excluding the non-50 US states from the analysis 
does not affect the results in the main part of the paper (e.g. Table 6, column 1 reports a CEO dummy coefficient of 0.073; using the updated data
results in a coefficient of 0.074). Across all models, the reported coefficients of  variables of interest are either not affected by this, 
or are marginally smaller/larger (i.e. +/- 0.001), and thus do not alter either the substantive results nor the conclusions. If needed, I will happily 
provide data and code to reproduce results in the main part of the paper (Table 4, 5, and 6) using the updated data, and reproducing the robustness checks 
in the appendix (Table A8, A9, A10, and A11) using the non-updated data.




Software versions:

All data analyses were conducted using R Studio [Version 3.6.1]


> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xtable_1.8-4    stargazer_5.2.2 lfe_2.8-3       Matrix_1.2-17   tidyr_0.8.3     rio_0.5.16      dplyr_0.8.3    

loaded via a namespace (and not attached):
 [1] zip_2.0.4         Rcpp_1.0.2        pillar_1.4.2      compiler_3.6.1    cellranger_1.1.0  forcats_0.4.0    
 [7] tools_3.6.1       zeallot_0.1.0     tibble_2.1.3      lattice_0.20-38   pkgconfig_2.0.2   rlang_0.4.0      
[13] openxlsx_4.1.0.1  rstudioapi_0.10   curl_4.0          haven_2.1.1       vctrs_0.2.0       hms_0.5.1        
[19] grid_3.6.1        tidyselect_0.2.5  glue_1.3.1        data.table_1.12.6 R6_2.4.0          readxl_1.3.1     
[25] foreign_0.8-71    Formula_1.2-3     purrr_0.3.2       magrittr_1.5      backports_1.1.4   assertthat_0.2.1 
[31] sandwich_2.5-1    crayon_1.3.4      zoo_1.8-6 